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TECHNICAL  PROGRESS  REPORT 

NUMBER  4 

Topic  Number;  SDIO  88-10 

Title:  Three  Dimensional  Cellular  Automata  for  Subpixel  Target  Detection 

Contract  Number:  N00014-88-C-0717 

From:  Kensal  Consulting,  Tucson,  Arizona  {Code:  0D9C9) 

To:  Dr.  Keith  Bromley,  NOSC,  San  Diego  (Code:  N00014) 

Project  Description: 

This  project  on  subpixel  target  detection  relates  to  research  in  the  optimization 
of  three-dimensional  computing  structures  for  use  in  target  detection  and  to 
research  in  the  reduction  of  an  optimum  computing  structure  to  an 
efficiently-designed  silicon  chip. 

Technical  Progress: 

During  January  this  project  continued  with  additional  work  on  the  subject 
matter  discussed  in  Technical  Progress  Report  Number  1,  i.e.,  the  mathematical 
optimization  of  planar  structures  for  executing  cellular  logic  transforms  based  on 
the  criterion  of  maximizing  pixops  (picture  point  operations)  per  device.  Whereas 
in  our  initial  work  optimization  had  been  based  on  a  constant  window  size  in  the 
512x512  field,  this  new  study  addressed  the  subject  of  variable  size  and  variable 
aspect  ratio  data  windows.  The  purpose  of  the  study  is  to  obtain  the  most 
efficient  use  of  silicon  in  designing  a  chip  for  target  detection  computations  in 
conjunction  with  our  subcontractor  Visual  Information  Technologies  (Texas). 

In  the  studies  undertaken  in  January,  four  configurations  were  studied.  Since 
the  equations  treating  these  configurations  are  non-linear,  arithmetic  means  were 


utilized  in  order  to  obtain  optimization  results  (instead  of  employing  algebraic 
equations  and  the  differential  calculus).  The  cases  studied  span  the  range  from  a 
configuration  where  the  LUT  memory  was  considerably  larger  than  the  data  window 
memory  to  the  opposite,  i.e.,  where  the  LUT  memory  was  considerable  smadler  than 
the  data  window  memory.  These  four  cases  will  be  taken  up  separately.  In  all  cases 
it  is  assumed  that  the  chip  is  addressed  in  a  byte  mode  with  a  byte  load  time  (or 
unload  time)  of  O.lus.  Also,  in  all  cases,  it  was  assumed  that  there  would  be  four 
devices  per  memory  cell  and,  of  course,  a  continued  assumption  that  the  memory  for 
the  window  data  was  triply  redundant  and  the  data  field  itself  always  512x512. 

Case  1 

The  first  case  considered  had  the  following  parametc's: 

Parameter  Value 

LUT  Memory  8x512x4  =  16,384  devices 

Window  Data  Memory  3x256x4  =  3,072  devices 

Total  Load  Time  (256/8)xlE-7  =  3.2us 

Since  information  from  the  window  data  memory  used  to  address  the  LUTs 
must  come  from  three  rows,  the  minimum  window  height  is  3.  By  the  same  token, 
using  a  byte-loaded  device,  the  minimum  window  width  is  3  bytes  (24  pixels)  in 
order  to  solve  the  border  overlap  problem  in  processing  eight  columns.  The  results 
for  this  case  are  given  in  the  below  tabulation  which  lists  merely  the  number  of 
rows  loaded  (window  height),  the  pixop  rate  per  device,  and  the  total  processing 
time  for  the  512x512  field.  Note  that  the  window  width  (in  pixels)  is  simply  the 
size  of  the  window  data  memory  (256  pixels)  divided  by  the  window  height  and 
adjusted  to  be  an  integral  number  of  bytes. 


Window  Height 

Pixop  Rate/Device 

Processing  Time 

3 

5.1E2 

26 

4 

7.5E2 

18 

5 

7.7E2 

17  (optimum) 

8 

7.7E2 

17 

10 

5.2E2 

25 

The  Processing  Time  is  given  in  milliseconds.  Results  are  plotted  in  Figure  1. 


Case  2 


The  second  case  considered  assumed  a  2048-bit  window  data  memory  leading  to 
the  following  oarameters: 


Parameter 


Value 


LUT  Memory 
Window  Data  Memory 
Total  Load  Time 


8x512x4  =  16,384  devices 
3x2048x4  =  24,576  devices 
(2048/8)xlE-7  =  25.6us 


These  parameters  led  to  the  following  results: 


Window  Height 

Pixop  Rate/Device 

Processing  Time 

4 

4.9E2 

7 

6.1E2 

- 

14 

7.3E2 

8.8  (optimum) 

25 

6.8E2 

- 

42 

6.0E2 

- 

64 

4.3E2 

- 

85 

2.8E2 

- 

In  the  above  tabulation  only  the  optimum  processing  time  is  shown.  All  other 
results  are  displayed  in  Figure  2.  It  can  be  seen  that  for  this  case  more  than  one 
graph  is  shown,  namely,  graphs  for  c=l,  c=2,  etc.  The  symbol  "c"  represents  the 
number  of  reentrant  recirculations  of  the  data.  In  Case  1,  recirculation  was 
infeasible.  As  can  be  seen,  recirculation  by  two  cycles  (c=2)  yields  a  somewhat 
higher  pixop  rate  and,  therefore,  improved  processing  time,  than  no  recirculation 
(c=l).  Improvement,  however,  is  not  particularly  dramatic  in  comparison  with  the 
improvement  in  optimum  processing  time  from  17ms  to  8.8ms. 


3 


Case  3 


In  the  third  case,  the  window  memory  was  enlarged  to  8192  bits  leading  to  the 
following  parameters; 


Parameter 

LUT  Memory 
Window  Data  Memory 
Total  Load  Time 


Value 

8x512x4  =  16,384  devices 
3x8192x4  =  58,304  devices 
(8192/8)xlE-7  =  102.4US 


In  this  case  load/unload  time  dominates.  The  pixop  rate  per  device 
decreases.  Since,  however,  there  are  significantly  more  devices,  one  might  expect 
the  processing  time  to  further  improve.  However,  this  is  not  the  case  as  is  shown 
in  the  below  table,  (\gain,  only  the  optimum  time  is  shown.) 


Window  Height 

Pixop  Rate/Device 

Processing  Time 

16 

3.2E2 

31 

3.1E2 

7.4  (optimum) 

56 

2.8E2 

- 

102 

2.4E2 

- 

170 

1.7E2 

- 

256 

1.2E2 

- 

342 

0.9E2 

- 

Results  are  plotted  in  Figure  3.  As  in  Figure  2,  recirculation  was  studied  for  the 
values  of  c=l,  2,  4,  and  8.  Due  to  the  fact  that  the  window  data  memory  was 
considerable  larger,  recirculation  by  eight  cycles  caused  improvement  in  the  total 
pixop  time  per  device  and,  therefore,  would  improve  the  total  time  per  field. 

Once  more,  the  improvement  is  by  a  relatively  small  factor. 


Case  4 


The  final  case  studied  enlarged  the  window  data  memory  even  farther  to 


4 


32,768  bits.  This  yielded  the  following  parameters. 


Parameter  Value 

LUT  Memory  8x512x4  =  16,384  devices 

Window  Data  Memory  3x32768x4  =  393,216  devices 

Total  Load  Time  (32768/8)xlE-7  =  409. 6us 


Analysis  of  this  case  led  to  the  following  results. 


Window  Height 

Pixop  Rate/Device 

Processing  Time 

64 

8.7E1 

7.4  (optimum) 

120 

7.8E1 

- 

240 

6.4E1 

- 

409 

4.9E1 

- 

512 

7.1E1 

- 

This  case  is  of  interest  since,  as  shown  in  Figure  4,  the  values  of  both  c=l  and 
2  show  an  initial  drop  in  pixop  rate  per  device  as  the  window  width  is  increased 
from  64  to  120  followed  by  a  recovery  as  window  height  is  further  increased.  The 
overall  processing  time  is  essentiaiily  the  same  as  for  both  cases  2  and  3,  indicating 
that  there  is  literally  very  little  value  in  placing  large  window  data  memories  on 
chip. 

Conclusion 

The  conclusion  of  this  parametric  study  is  quite  simple.  At  least  for  the 
planar  processor,  the  total  processing  time  of  a  512x512  data  field  can  be  increased 
somewhat  by  enlarging  the  window  data  memory  from  256  to  2048  bits.  Beyond 
that  little  or  nothing  is  gained  and  a  great  deal  is  lost  in  terms  of  the  extra 
silicon  employed.  These  results  have  been  transmitted  to  Visual  Information 
Technology  and  we  £ire  now  studying  the  implication  of  these  results  as  regards  the 
three-dimensional  track  detection  processor  described  in  Technical  Progress  Report 
Number  2. 
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